-
Notifications
You must be signed in to change notification settings - Fork 554
Update trtllm-gen fused moe routing kernel and add more kernels #1955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9d9ad95 to
7cd156d
Compare
Signed-off-by: jiahanc <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
b79b913 to
a5f9585
Compare
Signed-off-by: Siyuan Fu <[email protected]>
f060ab9 to
e7ac015
Compare
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
📌 Description
computeSelectedTileNtune_max_num_tokensto FP8 per-tensor and FP8 block scale🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes